Goto

Collaborating Authors

 regression estimator


AUnified Framework for Provably Efficient Algorithms to Estimate Shapley Values

Neural Information Processing Systems

Shapley values have emerged as a critical tool for explaining which features impact the decisions made by machine learning models. However, computing exact Shapley values is difficult, generally requiring an exponential (in the feature dimension) number of model evaluations. To address this, many model-agnostic randomized estimators have been developed, the most influential and widely used being the KernelSHAP method (Lundberg & Lee, 2017). While related estimators such as unbiased KernelSHAP (Covert & Lee, 2021) and LeverageSHAP (Musco & Witter, 2025) are known to satisfy theoretical guarantees, bounds for KernelSHAP have remained elusive. We describe a broad and unified framework that encompasses KernelSHAP and related estimators constructed using both with and without replacement sampling strategies.




A Unified Framework for Provably Efficient Algorithms to Estimate Shapley Values

arXiv.org Artificial Intelligence

Shapley values have emerged as a critical tool for explaining which features impact the decisions made by machine learning models. However, computing exact Shapley values is difficult, generally requiring an exponential (in the feature dimension) number of model evaluations. To address this, many model-agnostic randomized estimators have been developed, the most influential and widely used being the KernelSHAP method (Lundberg & Lee, 2017). While related estimators such as unbiased KernelSHAP (Covert & Lee, 2021) and LeverageSHAP (Musco & Witter, 2025) are known to satisfy theoretical guarantees, bounds for KernelSHAP have remained elusive. We describe a broad and unified framework that encompasses KernelSHAP and related estimators constructed using both with and without replacement sampling strategies. We then prove strong non-asymptotic theoretical guarantees that apply to all estimators from our framework. This provides, to the best of our knowledge, the first theoretical guarantees for KernelSHAP and sheds further light on tradeoffs between existing estimators. Through comprehensive benchmarking on small and medium dimensional datasets for Decision-Tree models, we validate our approach against exact Shapley values, consistently achieving low mean squared error with modest sample sizes. Furthermore, we make specific implementation improvements to enable scalability of our methods to high-dimensional datasets. Our methods, tested on datasets such MNIST and CIFAR10, provide consistently better results compared to the KernelSHAP library.




Non-asymptotic analysis of the performance of the penalized least trimmed squares in sparse models

arXiv.org Machine Learning

The least trimmed squares (LTS) estimator is a renowned robust alternative to the classic least squares estimator and is popular in location, regression, machine learning, and AI literature. Many studies exist on LTS, including its robustness, computation algorithms, extension to non-linear cases, asymptotics, etc. The LTS has been applied in the penalized regression in a high-dimensional real-data sparse-model setting where dimension $p$ (in thousands) is much larger than sample size $n$ (in tens, or hundreds). In such a practical setting, the sample size $n$ often is the count of sub-population that has a special attribute (e.g. the count of patients of Alzheimer's, Parkinson's, Leukemia, or ALS, etc.) among a population with a finite fixed size N. Asymptotic analysis assuming that $n$ tends to infinity is not practically convincing and legitimate in such a scenario. A non-asymptotic or finite sample analysis will be more desirable and feasible. This article establishes some finite sample (non-asymptotic) error bounds for estimating and predicting based on LTS with high probability for the first time.


Debiased Nonparametric Regression for Statistical Inference and Distributionally Robustness

arXiv.org Machine Learning

This study proposes a debiasing method for smooth nonparametric estimators. While machine learning techniques such as random forests and neural networks have demonstrated strong predictive performance, their theoretical properties remain relatively underexplored. Specifically, many modern algorithms lack assurances of pointwise asymptotic normality and uniform convergence, which are critical for statistical inference and robustness under covariate shift and have been well-established for classical methods like Nadaraya-Watson regression. To address this, we introduce a model-free debiasing method that guarantees these properties for smooth estimators derived from any nonparametric regression approach. By adding a correction term that estimates the conditional expected residual of the original estimator, or equivalently, its estimation error, we obtain a debiased estimator with proven pointwise asymptotic normality, and uniform convergence. These properties enable statistical inference and enhance robustness to covariate shift, making the method broadly applicable to a wide range of nonparametric regression problems.


Double Cross-fit Doubly Robust Estimators: Beyond Series Regression

arXiv.org Machine Learning

Doubly robust estimators with cross-fitting have gained popularity in causal inference due to their favorable structure-agnostic error guarantees. However, when additional structure, such as H\"{o}lder smoothness, is available then more accurate "double cross-fit doubly robust" (DCDR) estimators can be constructed by splitting the training data and undersmoothing nuisance function estimators on independent samples. We study a DCDR estimator of the Expected Conditional Covariance, a functional of interest in causal inference and conditional independence testing, and derive a series of increasingly powerful results with progressively stronger assumptions. We first provide a structure-agnostic error analysis for the DCDR estimator with no assumptions on the nuisance functions or their estimators. Then, assuming the nuisance functions are H\"{o}lder smooth, but without assuming knowledge of the true smoothness level or the covariate density, we establish that DCDR estimators with several linear smoothers are semiparametric efficient under minimal conditions and achieve fast convergence rates in the non-$\sqrt{n}$ regime. When the covariate density and smoothnesses are known, we propose a minimax rate-optimal DCDR estimator based on undersmoothed kernel regression. Moreover, we show an undersmoothed DCDR estimator satisfies a slower-than-$\sqrt{n}$ central limit theorem, and that inference is possible even in the non-$\sqrt{n}$ regime. Finally, we support our theoretical results with simulations, providing intuition for double cross-fitting and undersmoothing, demonstrating where our estimator achieves semiparametric efficiency while the usual "single cross-fit" estimator fails, and illustrating asymptotic normality for the undersmoothed DCDR estimator.


A Polynomial-time Form of Robust Regression

Neural Information Processing Systems

Despite the variety of robust regression methods that have been developed, current regression formulations are either NP-hard, or allow unbounded response to even a single leverage point. We present a general formulation for robust regression--Variational M-estimation--that unifies a number of robust regression methods while allowing a tractable approximation strategy. We develop an estimator that requires only polynomial-time, while achieving certain robustness and consistency guarantees. An experimental evaluation demonstrates the effectiveness of the new estimation approach compared to standard methods.